Overview

Dataset statistics

Number of variables17
Number of observations510219
Missing cells1260319
Missing cells (%)14.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory66.2 MiB
Average record size in memory136.0 B

Variable types

Numeric8
Categorical8
Unsupported1

Alerts

Date has a high cardinality: 577 distinct values High cardinality
DayOfWeek is highly correlated with OpenHigh correlation
Open is highly correlated with DayOfWeekHigh correlation
DayOfWeek is highly correlated with OpenHigh correlation
Open is highly correlated with DayOfWeekHigh correlation
StoreType is highly correlated with AssortmentHigh correlation
PromoInterval is highly correlated with Promo2High correlation
Assortment is highly correlated with StoreTypeHigh correlation
Promo2 is highly correlated with PromoIntervalHigh correlation
DayOfWeek is highly correlated with OpenHigh correlation
Open is highly correlated with DayOfWeekHigh correlation
StoreType is highly correlated with AssortmentHigh correlation
Assortment is highly correlated with StoreTypeHigh correlation
CompetitionOpenSinceYear is highly correlated with Promo2SinceWeekHigh correlation
Promo2SinceWeek is highly correlated with CompetitionOpenSinceYear and 2 other fieldsHigh correlation
Promo2SinceYear is highly correlated with Promo2SinceWeek and 1 other fieldsHigh correlation
PromoInterval is highly correlated with Promo2SinceWeek and 1 other fieldsHigh correlation
DayOfWeek has 15164 (3.0%) missing values Missing
Open has 15396 (3.0%) missing values Missing
Promo has 15368 (3.0%) missing values Missing
StateHoliday has 15558 (3.0%) missing values Missing
SchoolHoliday has 15535 (3.0%) missing values Missing
StoreType has 15409 (3.0%) missing values Missing
Assortment has 15409 (3.0%) missing values Missing
CompetitionDistance has 16714 (3.3%) missing values Missing
CompetitionOpenSinceMonth has 172595 (33.8%) missing values Missing
CompetitionOpenSinceYear has 172595 (33.8%) missing values Missing
Promo2 has 15409 (3.0%) missing values Missing
Promo2SinceWeek has 258389 (50.6%) missing values Missing
Promo2SinceYear has 258389 (50.6%) missing values Missing
PromoInterval has 258389 (50.6%) missing values Missing
df_index is uniformly distributed Uniform
df_index has unique values Unique
StateHoliday is an unsupported type, check if it needs cleaning or further analysis Unsupported
Store has 15409 (3.0%) zeros Zeros

Reproduction

Analysis started2021-10-28 10:10:55.679360
Analysis finished2021-10-28 10:11:44.805870
Duration49.13 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct510219
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean318838.4463
Minimum0
Maximum637772
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size3.9 MiB
2021-10-28T12:11:44.927432image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile31846.8
Q1159407.5
median318831
Q3478200.5
95-th percentile605804.1
Maximum637772
Range637772
Interquartile range (IQR)318793

Descriptive statistics

Standard deviation184057.5651
Coefficient of variation (CV)0.577275317
Kurtosis-1.199221986
Mean318838.4463
Median Absolute Deviation (MAD)159396
Skewness7.016436764 × 10-5
Sum1.626774332 × 1011
Variance3.387718729 × 1010
MonotonicityNot monotonic
2021-10-28T12:11:45.093813image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1430271
 
< 0.1%
5480041
 
< 0.1%
6352461
 
< 0.1%
557011
 
< 0.1%
3842291
 
< 0.1%
5709621
 
< 0.1%
3266791
 
< 0.1%
2068181
 
< 0.1%
4561701
 
< 0.1%
515521
 
< 0.1%
Other values (510209)510209
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
81
< 0.1%
91
< 0.1%
101
< 0.1%
121
< 0.1%
131
< 0.1%
ValueCountFrequency (%)
6377721
< 0.1%
6377711
< 0.1%
6377701
< 0.1%
6377691
< 0.1%
6377681
< 0.1%
6377671
< 0.1%
6377661
< 0.1%
6377631
< 0.1%
6377621
< 0.1%
6377611
< 0.1%

Date
Categorical

HIGH CARDINALITY

Distinct577
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.9 MiB
2014-04-28
 
928
2014-04-14
 
920
2014-03-23
 
920
2013-08-17
 
919
2014-02-20
 
918
Other values (572)
505614 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2013-05-09
2nd row2014-05-12
3rd row2014-06-13
4th row2013-07-23
5th row2014-05-19

Common Values

ValueCountFrequency (%)
2014-04-28928
 
0.2%
2014-04-14920
 
0.2%
2014-03-23920
 
0.2%
2013-08-17919
 
0.2%
2014-02-20918
 
0.2%
2013-12-21918
 
0.2%
2013-05-18917
 
0.2%
2014-03-22917
 
0.2%
2013-06-17917
 
0.2%
2013-05-06917
 
0.2%
Other values (567)501028
98.2%

Length

2021-10-28T12:11:45.401807image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2014-04-28928
 
0.2%
2014-03-23920
 
0.2%
2014-04-14920
 
0.2%
2013-08-17919
 
0.2%
2014-02-20918
 
0.2%
2013-12-21918
 
0.2%
2013-05-06917
 
0.2%
2013-06-17917
 
0.2%
2014-03-22917
 
0.2%
2013-05-18917
 
0.2%
Other values (567)501028
98.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Store
Real number (ℝ≥0)

ZEROS

Distinct1116
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean541.4548204
Minimum0
Maximum1115
Zeros15409
Zeros (%)3.0%
Negative0
Negative (%)0.0%
Memory size3.9 MiB
2021-10-28T12:11:45.545340image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile23
Q1253
median541
Q3829
95-th percentile1058
Maximum1115
Range1115
Interquartile range (IQR)576

Descriptive statistics

Standard deviation331.2175032
Coefficient of variation (CV)0.6117177108
Kurtosis-1.211137323
Mean541.4548204
Median Absolute Deviation (MAD)288
Skewness0.007633122766
Sum276260537
Variance109705.0344
MonotonicityNot monotonic
2021-10-28T12:11:45.710430image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
015409
 
3.0%
943478
 
0.1%
240475
 
0.1%
1062475
 
0.1%
621475
 
0.1%
979474
 
0.1%
225473
 
0.1%
910472
 
0.1%
90472
 
0.1%
580472
 
0.1%
Other values (1106)490544
96.1%
ValueCountFrequency (%)
015409
3.0%
1443
 
0.1%
2459
 
0.1%
3447
 
0.1%
4455
 
0.1%
5455
 
0.1%
6441
 
0.1%
7451
 
0.1%
8425
 
0.1%
9446
 
0.1%
ValueCountFrequency (%)
1115466
0.1%
1114460
0.1%
1113447
0.1%
1112454
0.1%
1111459
0.1%
1110438
0.1%
1109428
0.1%
1108447
0.1%
1107420
0.1%
1106435
0.1%

DayOfWeek
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct7
Distinct (%)< 0.1%
Missing15164
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean3.994782398
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.9 MiB
2021-10-28T12:11:45.845628image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.997871586
Coefficient of variation (CV)0.5001202536
Kurtosis-1.247656306
Mean3.994782398
Median Absolute Deviation (MAD)2
Skewness0.005520834778
Sum1977637
Variance3.991490874
MonotonicityNot monotonic
2021-10-28T12:11:45.957555image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
271230
14.0%
471166
13.9%
371036
13.9%
170486
13.8%
770424
13.8%
670386
13.8%
570327
13.8%
(Missing)15164
 
3.0%
ValueCountFrequency (%)
170486
13.8%
271230
14.0%
371036
13.9%
471166
13.9%
570327
13.8%
670386
13.8%
770424
13.8%
ValueCountFrequency (%)
770424
13.8%
670386
13.8%
570327
13.8%
471166
13.9%
371036
13.9%
271230
14.0%
170486
13.8%

Open
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing15396
Missing (%)3.0%
Memory size3.9 MiB
1.0
410312 
0.0
84511 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row1.0
3rd row1.0
4th row1.0
5th row0.0

Common Values

ValueCountFrequency (%)
1.0410312
80.4%
0.084511
 
16.6%
(Missing)15396
 
3.0%

Length

2021-10-28T12:11:46.085397image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-28T12:11:46.165538image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
1.0410312
82.9%
0.084511
 
17.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Promo
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing15368
Missing (%)3.0%
Memory size3.9 MiB
0.0
311529 
1.0
183322 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row1.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0311529
61.1%
1.0183322
35.9%
(Missing)15368
 
3.0%

Length

2021-10-28T12:11:46.251208image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-28T12:11:46.336504image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0.0311529
63.0%
1.0183322
37.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

StateHoliday
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing15558
Missing (%)3.0%
Memory size3.9 MiB

SchoolHoliday
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing15535
Missing (%)3.0%
Memory size3.9 MiB
0.0
408945 
1.0
85739 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row1.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0408945
80.2%
1.085739
 
16.8%
(Missing)15535
 
3.0%

Length

2021-10-28T12:11:46.420620image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-28T12:11:46.501146image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0.0408945
82.7%
1.085739
 
17.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

StoreType
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct4
Distinct (%)< 0.1%
Missing15409
Missing (%)3.0%
Memory size3.9 MiB
a
267479 
d
153820 
c
65943 
b
 
7568

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowa
2nd rowa
3rd rowa
4th rowa
5th rowd

Common Values

ValueCountFrequency (%)
a267479
52.4%
d153820
30.1%
c65943
 
12.9%
b7568
 
1.5%
(Missing)15409
 
3.0%

Length

2021-10-28T12:11:46.584585image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-28T12:11:46.668709image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
a267479
54.1%
d153820
31.1%
c65943
 
13.3%
b7568
 
1.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Assortment
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing15409
Missing (%)3.0%
Memory size3.9 MiB
a
262798 
c
228012 
b
 
4000

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowa
2nd rowc
3rd rowa
4th rowa
5th rowa

Common Values

ValueCountFrequency (%)
a262798
51.5%
c228012
44.7%
b4000
 
0.8%
(Missing)15409
 
3.0%

Length

2021-10-28T12:11:46.764055image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-28T12:11:46.846148image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
a262798
53.1%
c228012
46.1%
b4000
 
0.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

CompetitionDistance
Real number (ℝ≥0)

MISSING

Distinct654
Distinct (%)0.1%
Missing16714
Missing (%)3.3%
Infinite0
Infinite (%)0.0%
Mean5416.801025
Minimum20
Maximum75860
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.9 MiB
2021-10-28T12:11:46.954879image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile140
Q1710
median2330
Q36890
95-th percentile20260
Maximum75860
Range75840
Interquartile range (IQR)6180

Descriptive statistics

Standard deviation7689.737858
Coefficient of variation (CV)1.419608699
Kurtosis13.05281421
Mean5416.801025
Median Absolute Deviation (MAD)1980
Skewness2.931106981
Sum2673218390
Variance59132068.33
MonotonicityNot monotonic
2021-10-28T12:11:47.129603image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2505357
 
1.0%
12003895
 
0.8%
3503606
 
0.7%
1903603
 
0.7%
503546
 
0.7%
3303166
 
0.6%
1803131
 
0.6%
903118
 
0.6%
1503102
 
0.6%
26402680
 
0.5%
Other values (644)458301
89.8%
(Missing)16714
 
3.3%
ValueCountFrequency (%)
20447
 
0.1%
301816
0.4%
402219
0.4%
503546
0.7%
601316
 
0.3%
702198
0.4%
801311
 
0.3%
903118
0.6%
1002240
0.4%
1102648
0.5%
ValueCountFrequency (%)
75860455
0.1%
58260444
0.1%
48330457
0.1%
46590463
0.1%
45740452
0.1%
44320438
0.1%
40860439
0.1%
40540464
0.1%
38710450
0.1%
38630472
0.1%

CompetitionOpenSinceMonth
Real number (ℝ≥0)

MISSING

Distinct12
Distinct (%)< 0.1%
Missing172595
Missing (%)33.8%
Infinite0
Infinite (%)0.0%
Mean7.22262043
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.9 MiB
2021-10-28T12:11:47.284886image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q14
median8
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.211096101
Coefficient of variation (CV)0.4445887933
Kurtosis-1.244118275
Mean7.22262043
Median Absolute Deviation (MAD)3
Skewness-0.1688326784
Sum2438530
Variance10.31113817
MonotonicityNot monotonic
2021-10-28T12:11:47.400399image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
955361
 
10.9%
441928
 
8.2%
1140796
 
8.0%
331023
 
6.1%
729573
 
5.8%
1228417
 
5.6%
1027040
 
5.3%
622211
 
4.4%
519427
 
3.8%
218287
 
3.6%
Other values (2)23561
 
4.6%
(Missing)172595
33.8%
ValueCountFrequency (%)
16162
 
1.2%
218287
 
3.6%
331023
6.1%
441928
8.2%
519427
 
3.8%
622211
4.4%
729573
5.8%
817399
 
3.4%
955361
10.9%
1027040
5.3%
ValueCountFrequency (%)
1228417
5.6%
1140796
8.0%
1027040
5.3%
955361
10.9%
817399
 
3.4%
729573
5.8%
622211
4.4%
519427
 
3.8%
441928
8.2%
331023
6.1%

CompetitionOpenSinceYear
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct23
Distinct (%)< 0.1%
Missing172595
Missing (%)33.8%
Infinite0
Infinite (%)0.0%
Mean2008.678403
Minimum1900
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.9 MiB
2021-10-28T12:11:47.524101image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1900
5-th percentile2001
Q12006
median2010
Q32013
95-th percentile2015
Maximum2015
Range115
Interquartile range (IQR)7

Descriptive statistics

Standard deviation6.102473023
Coefficient of variation (CV)0.003038053784
Kurtosis125.0945533
Mean2008.678403
Median Absolute Deviation (MAD)3
Skewness-7.802609127
Sum678178037
Variance37.24017699
MonotonicityNot monotonic
2021-10-28T12:11:47.669421image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
201336762
 
7.2%
201236188
 
7.1%
201431126
 
6.1%
200527475
 
5.4%
201024577
 
4.8%
201124011
 
4.7%
200923951
 
4.7%
200823938
 
4.7%
200721308
 
4.2%
200620816
 
4.1%
Other values (13)67472
 
13.2%
(Missing)172595
33.8%
ValueCountFrequency (%)
1900409
 
0.1%
1961460
 
0.1%
19902240
 
0.4%
1994889
 
0.2%
1995844
 
0.2%
1998455
 
0.1%
19993519
 
0.7%
20004433
 
0.9%
20017135
1.4%
200212009
2.4%
ValueCountFrequency (%)
201516896
3.3%
201431126
6.1%
201336762
7.2%
201236188
7.1%
201124011
4.7%
201024577
4.8%
200923951
4.7%
200823938
4.7%
200721308
4.2%
200620816
4.1%

Promo2
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing15409
Missing (%)3.0%
Memory size3.9 MiB
1.0
251830 
0.0
242980 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row1.0
3rd row0.0
4th row1.0
5th row0.0

Common Values

ValueCountFrequency (%)
1.0251830
49.4%
0.0242980
47.6%
(Missing)15409
 
3.0%

Length

2021-10-28T12:11:47.820312image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-28T12:11:47.904845image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
1.0251830
50.9%
0.0242980
49.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Promo2SinceWeek
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct24
Distinct (%)< 0.1%
Missing258389
Missing (%)50.6%
Infinite0
Infinite (%)0.0%
Mean23.51165469
Minimum1
Maximum50
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.9 MiB
2021-10-28T12:11:48.136156image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q113
median22
Q337
95-th percentile45
Maximum50
Range49
Interquartile range (IQR)24

Descriptive statistics

Standard deviation14.12712419
Coefficient of variation (CV)0.6008562296
Kurtosis-1.383556579
Mean23.51165469
Median Absolute Deviation (MAD)13
Skewness0.08278416867
Sum5920940
Variance199.5756378
MonotonicityNot monotonic
2021-10-28T12:11:48.285998image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
1436017
 
7.1%
4033070
 
6.5%
3119446
 
3.8%
1018654
 
3.7%
517310
 
3.4%
3715625
 
3.1%
115603
 
3.1%
1314918
 
2.9%
4514898
 
2.9%
2214467
 
2.8%
Other values (14)51822
 
10.2%
(Missing)258389
50.6%
ValueCountFrequency (%)
115603
3.1%
517310
3.4%
6460
 
0.1%
96167
 
1.2%
1018654
3.7%
1314918
2.9%
1436017
7.1%
1812889
 
2.5%
2214467
2.8%
232160
 
0.4%
ValueCountFrequency (%)
50460
 
0.1%
49426
 
0.1%
484038
 
0.8%
4514898
2.9%
441344
 
0.3%
4033070
6.5%
392599
 
0.5%
3715625
3.1%
364421
 
0.9%
3511145
 
2.2%

Promo2SinceYear
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct7
Distinct (%)< 0.1%
Missing258389
Missing (%)50.6%
Infinite0
Infinite (%)0.0%
Mean2011.756713
Minimum2009
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.9 MiB
2021-10-28T12:11:48.419281image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum2009
5-th percentile2009
Q12011
median2012
Q32013
95-th percentile2014
Maximum2015
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.668860197
Coefficient of variation (CV)0.0008295536863
Kurtosis-1.053934856
Mean2011.756713
Median Absolute Deviation (MAD)1
Skewness-0.1165318257
Sum506620693
Variance2.785094357
MonotonicityNot monotonic
2021-10-28T12:11:48.534550image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
201156761
 
11.1%
201353255
 
10.4%
201440985
 
8.0%
201236056
 
7.1%
200932235
 
6.3%
201028128
 
5.5%
20154410
 
0.9%
(Missing)258389
50.6%
ValueCountFrequency (%)
200932235
6.3%
201028128
5.5%
201156761
11.1%
201236056
7.1%
201353255
10.4%
201440985
8.0%
20154410
 
0.9%
ValueCountFrequency (%)
20154410
 
0.9%
201440985
8.0%
201353255
10.4%
201236056
7.1%
201156761
11.1%
201028128
5.5%
200932235
6.3%

PromoInterval
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing258389
Missing (%)50.6%
Memory size3.9 MiB
Jan,Apr,Jul,Oct
146974 
Feb,May,Aug,Nov
57718 
Mar,Jun,Sept,Dec
47138 

Length

Max length16
Median length15
Mean length15.18718183
Min length15

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMar,Jun,Sept,Dec
2nd rowJan,Apr,Jul,Oct
3rd rowMar,Jun,Sept,Dec
4th rowJan,Apr,Jul,Oct
5th rowJan,Apr,Jul,Oct

Common Values

ValueCountFrequency (%)
Jan,Apr,Jul,Oct146974
28.8%
Feb,May,Aug,Nov57718
 
11.3%
Mar,Jun,Sept,Dec47138
 
9.2%
(Missing)258389
50.6%

Length

2021-10-28T12:11:48.682430image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-28T12:11:48.780546image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
jan,apr,jul,oct146974
58.4%
feb,may,aug,nov57718
 
22.9%
mar,jun,sept,dec47138
 
18.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

2021-10-28T12:11:38.565573image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:25.332513image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:27.331345image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:29.217189image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:31.277641image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:33.231215image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:34.985077image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:36.767558image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:38.787003image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:25.625814image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:27.582363image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:29.474173image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:31.555726image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:33.455690image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:35.215019image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:36.986636image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:38.991248image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:25.880479image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:27.826447image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:29.719900image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:31.820138image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:33.673737image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:35.442046image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:37.342844image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:39.211171image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:26.140126image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:28.077550image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:29.981837image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:32.093047image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:33.904061image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:35.677723image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:37.554988image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:39.399578image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:26.383118image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:28.304853image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:30.217792image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:32.338730image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:34.123474image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:35.914793image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:37.743683image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:39.590562image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:26.613827image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:28.524150image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:30.441424image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:32.571988image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:34.349047image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:36.137323image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:37.931069image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:39.812916image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:26.833062image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:28.735097image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:30.655720image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:32.790575image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:34.552748image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:36.343275image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:38.143375image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:40.015768image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:27.049807image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:28.944129image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:30.861781image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:33.000029image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:34.745641image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:36.538022image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-28T12:11:38.350904image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2021-10-28T12:11:48.895489image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-10-28T12:11:49.159467image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-10-28T12:11:49.426865image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-10-28T12:11:49.700402image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2021-10-28T12:11:49.929689image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-10-28T12:11:40.422051image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-10-28T12:11:41.429009image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-10-28T12:11:43.726131image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-10-28T12:11:44.305086image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexDateStoreDayOfWeekOpenPromoStateHolidaySchoolHolidayStoreTypeAssortmentCompetitionDistanceCompetitionOpenSinceMonthCompetitionOpenSinceYearPromo2Promo2SinceWeekPromo2SinceYearPromoInterval
01430272013-05-09834.00.0NaNa0.0aa2710.0NaNNaN0.0NaNNaNNaN
15534202014-05-12681.0NaN0.000.0ac250.0NaNNaN1.035.02012.0Mar,Jun,Sept,Dec
25890102014-06-13885.01.00.000.0aa10690.010.02005.00.0NaNNaNNaN
32272272013-07-236852.01.00.001.0aa650.011.02013.01.037.02009.0Jan,Apr,Jul,Oct
45617232014-05-196891.01.01.000.0da15040.010.02004.00.0NaNNaNNaN
51432812013-05-099364.00.00.0a0.0aa580.02.02008.00.0NaNNaNNaN
65622142014-05-204742.01.01.000.0ca14810.0NaNNaN1.014.02011.0Mar,Jun,Sept,Dec
73657092013-11-247397.0NaN0.000.0dc2770.06.02008.01.022.02011.0Jan,Apr,Jul,Oct
85094512014-04-026673.01.01.00.00.0dc2870.09.02012.00.0NaNNaNNaN
93872122013-12-1484NaN1.00.000.0ac11810.08.02014.00.0NaNNaNNaN

Last rows

df_indexDateStoreDayOfWeekOpenPromoStateHolidaySchoolHolidayStoreTypeAssortmentCompetitionDistanceCompetitionOpenSinceMonthCompetitionOpenSinceYearPromo2Promo2SinceWeekPromo2SinceYearPromoInterval
5102091752032013-06-075105.01.01.000.0ac8260.0NaNNaN0.0NaNNaNNaN
510210874982013-03-202003.01.01.000.0aa1650.010.02000.00.0NaNNaNNaN
5102115214302014-04-1310607.00.00.00.00.0ac3430.0NaNNaN1.031.02013.0Feb,May,Aug,Nov
5102121373372013-05-045276.01.00.000.0dc5830.04.02008.00.0NaNNaNNaN
510213548862013-02-194772.01.01.000.0da770.07.02010.01.035.02010.0Jan,Apr,Jul,Oct
5102141102682013-04-097742.01.01.001.0ac640.09.02013.00.0NaNNaNNaN
5102152591782013-08-211593.01.00.001.0da8530.03.02013.00.0NaNNaNNaN
5102163658382013-11-253321.01.00.000.0aa1840.03.02006.00.0NaNNaNNaN
5102171319322013-04-29171.01.01.0NaN0.0aa50.012.02005.01.026.02010.0Jan,Apr,Jul,Oct
5102181219582013-04-202346.01.00.000.0da4370.0NaNNaN0.0NaNNaNNaN